A Hybrid Model for Sense Guessing of Chinese Unknown Words
نویسندگان
چکیده
This paper proposes a hybrid model to address the task of sense guessing for Chinese unknown words. Three types of similarity, i.e., positional, syntactic and semantic similarity, are analyzed; and three models are developed accordingly. Then the three models are combined to form a hybrid one (HPPS Model). To verify the effectiveness and consistency of HPPS, experiments were conducted on ten test sets which were collected from two popular Chinese thesauruses Cilin and HowNet. In addition, extra experiments were made on a test set of 2000 words which were collected from newspaper. The experiments show that HPPS Model consistently produces 4%~6% F-score improvement over the best results reported in previous researches.
منابع مشابه
Hybrid Methods for POS Guessing of Chinese Unknown Words
This paper describes a hybrid model that combines a rule-based model with two statistical models for the task of POS guessing of Chinese unknown words. The rule-based model is sensitive to the type, length, and internal structure of unknown words, and the two statistical models utilize contextual information and the likelihood for a character to appear in a particular position of words of a par...
متن کاملChinese POS Disambiguation and Unknown Word Guessing with Lexicalized HMMs
This article presents a lexicalized HMM-based approach to Chinese part-of-speech (POS) disambiguation and unknown word guessing (UWG). In order to explore word-internal morphological features for Chinese POS tagging, four types of pattern tags are defined to indicate the way lexicon words are used in a segmented sentence. Such patterns are combined further with POS tags. Thus, Chinese POS disam...
متن کاملc○2005 The Association for Computational Linguistics
This paper describes a hybrid model that combines a rule-based model with two statistical models for the task of POS guessing of Chinese unknown words. The rule-based model is sensitive to the type, length, and internal structure of unknown words, and the two statistical models utilize contextual information and the likelihood for a character to appear in a particular position of words of a par...
متن کاملA Method for Automatic POS Guessing of Chinese Unknown Words
This paper proposes a method for automatic POS (part-of-speech) guessing of Chinese unknown words. It contains two models. The first model uses a machinelearning method to predict the POS of unknown words based on their internal component features. The credibility of the results of the first model is then measured. For low-credibility words, the second model is used to revise the first model’s ...
متن کاملHybrid Models for Chinese Unknown Word Resolution Dissertation
Word segmentation, part-of-speech (POS) tagging, and sense tagging are important steps in various Chinese natural language processing (CNLP) systems. Unknown words, i.e., words that are not in the dictionary or training data used in a CNLP system, constitute a major challenge for each of these steps. This dissertation is concerned with developing hybrid models that effectively combine statistic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009